---
# id: vllm
title: vLLM
sidebar_label: vLLM
---

`vLLM` is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. `deepeval` can connect to a running `vLLM` server for running local evaluations.

### Command Line

1. Launch your `vLLM` server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is: `http://localhost:8000/v1/`.
2. Then run the following command to configure `deepeval`:

```bash
deepeval set-local-model --model-name=<model_name> \
    --base-url="http://localhost:8000/v1/" \
    --api-key=<api-key>
```

:::tip
You can use any value for `--api-key` if authentication is not enforced.
:::

:::tip Persisting settings
You can persist CLI settings with the optional `--save` flag.
See [Flags and Configs -> Persisting CLI settings](/docs/evaluation-flags-and-configs#persisting-cli-settings-with---save).
:::

### Reverting to OpenAI

To disable the local model and return to OpenAI:

```bash
deepeval unset-local-model
```

:::info
For advanced setup or deployment options (e.g. multi-GPU, HuggingFace models), see the [vLLM documentation](https://vllm.ai/).
:::
